[WIP]Add Func: aclgraph_batch_size auto-adjust to different model #739

chris668899 · 2025-04-30T07:18:39Z

What this PR does / why we need it?

This PR add new function of : aclgraph_batch_size can dynamic adjust to different model; before this PR, the aclgraph_batch_sizes given from vllm to vllm-ascend always too large, and that may result in ERROR while running on different, with the information: "The resources are insufficient".
Now, with this PR, the code can dynamic adjust aclgraph_batch_sizes depend on the model hidden_layer_nums and parallel config, for example:
a. for Qwen2.5-7B, the aclgraph_batch_size length is 33 total;
b. for Qwen2.5-72B, the aclgraph_batch_size length is 11 total;

ganyi1996ppo · 2025-05-06T07:08:32Z

vllm_ascend/worker/model_runner_v1.py

+        parallel_type_cnt = 0
+        dp_size = self.vllm_config.parallel_config.data_parallel_size
+        tp_size = self.vllm_config.parallel_config.tensor_parallel_size
+        if dp_size > 1:


So the bigger the parallel size, the smaller the graph step? Should be bigger right?

The types of parallel strategies influence the length of the list. Therefore, the more types of parallel strategies there are, the shorter the list becomes. However, the maximum supported batch_size value in the list remains unchanged.

tests/singlecard/test_dynamic_npugraph_batchsize.py

MengqingCao · 2025-05-06T08:59:22Z

tests/singlecard/test_dynamic_npugraph_batchsize.py

+from torch_npu.op_plugin.atb._atb_ops import _register_atb_extensions
+from vllm import LLM, SamplingParams
+
+_register_atb_extensions()


what does this do?

torch_npu needs to preload atb's .so before the dyanmo trace procedure.

"_register_atb_extensions()" has been removed

MengqingCao · 2025-05-06T09:01:41Z

tests/singlecard/test_dynamic_npugraph_batchsize.py

+    "Qwen/Qwen2.5-0.5B-Instruct",
+]
+
+TENSOR_PARALLELS = [2]


This is a multicard ut, let's move this to path tests/multicard to make sure it is tested as expected

has been moved to multicard

ganyi1996ppo · 2025-05-06T10:54:18Z

Please don't merge this PR, we may still need to discuss it with torch_npu and CANN team on this. This solution is neither follow the cuda behavior nor good for performance.

ganyi1996ppo · 2025-05-07T02:53:23Z

Please replace all the npugraph to aclgraph.

ganyi1996ppo · 2025-05-07T02:53:40Z

For now, seems we don't have much choice on this, for the large model with lots of layers and comm group, we may only have small number of aclgraph cached in memory. Which means enormous padding may happened in many scenario and thus cause the performance regression. cc @wangxiyuan @Yikun

chris668899 force-pushed the main branch from 927f83a to bd91be1 Compare April 30, 2025 07:32

github-actions bot added the module:tests label Apr 30, 2025

chris668899 force-pushed the main branch from c97798d to 18c0979 Compare April 30, 2025 08:09

ganyi1996ppo reviewed May 6, 2025

View reviewed changes

MengqingCao reviewed May 6, 2025

View reviewed changes

chris668899 force-pushed the main branch from 5867425 to 7927bf5 Compare May 6, 2025 13:27

chris668899 changed the title ~~[WIP]Add Func: npugraph_batch_size auto-adjust to different model~~ [WIP]Add Func: aclgraph_batch_size auto-adjust to different model May 7, 2025

chris668899 force-pushed the main branch from c2e6a22 to 4510d8f Compare May 7, 2025 03:09

github-actions bot added documentation Improvements or additions to documentation module:ops labels May 7, 2025

Add Func: aclgraph_batch_size auto-adjust to different model

4510d8f

chris668899 closed this May 7, 2025

chris668899 mentioned this pull request May 7, 2025

Add Func: aclgraph_batch_size auto-adjust to different model #771

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP]Add Func: aclgraph_batch_size auto-adjust to different model #739

[WIP]Add Func: aclgraph_batch_size auto-adjust to different model #739

Uh oh!

chris668899 commented Apr 30, 2025 •

edited

Loading

Uh oh!

ganyi1996ppo May 6, 2025

Uh oh!

chris668899 May 6, 2025

Uh oh!

Uh oh!

MengqingCao May 6, 2025

Uh oh!

ganyi1996ppo May 6, 2025

Uh oh!

chris668899 May 6, 2025

Uh oh!

MengqingCao May 6, 2025

Uh oh!

chris668899 May 6, 2025 •

edited

Loading

Uh oh!

ganyi1996ppo commented May 6, 2025 •

edited

Loading

Uh oh!

ganyi1996ppo commented May 7, 2025 •

edited

Loading

Uh oh!

ganyi1996ppo commented May 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

[WIP]Add Func: aclgraph_batch_size auto-adjust to different model #739

[WIP]Add Func: aclgraph_batch_size auto-adjust to different model #739

Uh oh!

Conversation

chris668899 commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Uh oh!

ganyi1996ppo May 6, 2025

Choose a reason for hiding this comment

Uh oh!

chris668899 May 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MengqingCao May 6, 2025

Choose a reason for hiding this comment

Uh oh!

ganyi1996ppo May 6, 2025

Choose a reason for hiding this comment

Uh oh!

chris668899 May 6, 2025

Choose a reason for hiding this comment

Uh oh!

MengqingCao May 6, 2025

Choose a reason for hiding this comment

Uh oh!

chris668899 May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ganyi1996ppo commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ganyi1996ppo commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ganyi1996ppo commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

chris668899 commented Apr 30, 2025 •

edited

Loading

chris668899 May 6, 2025 •

edited

Loading

ganyi1996ppo commented May 6, 2025 •

edited

Loading

ganyi1996ppo commented May 7, 2025 •

edited

Loading

ganyi1996ppo commented May 7, 2025 •

edited

Loading